Download A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues
In this paper we propose a complete computational system for Auditory Scene Analysis. This time-frequency system localizes, separates, and spatializes an arbitrary number of audio sources given only binaural signals. The localization is based on recent research frameworks, where interaural level and time differences are combined to derive a confident direction of arrival (azimuth) at each frequency bin. Here, the power-weighted histogram constructed in the azimuth space is modeled as a Gaussian Mixture Model, whose parameter structure is revealed through a weighted Expectation Maximization. Afterwards, a bank of Gaussian spatial filters is configured automatically to extract the sources with significant energy accordingly to a posterior probability. In this frequency-domain framework, we also inverse a geometrical and physical head model to derive an algorithm that simulates a source as originating from any azimuth angle.
Download Performance of source spatialization and source localization Algorithms using Conjoint Models of Interaural Level and Time Cues
In this paper, we describe a head-model based on interaural cues (e.g. interaural level differences and interaural time differences). Based on this model, we proposed, in previous works, a binaural source spatialization method (SSPA), that we extended to a multispeaker spatialization technique that works on a speaker array in a pairwise motion (MSPA) [1], [2]. Here, we evaluate the spatialization techniques, and compare them to well-known methods (e.g. VBAP (Vector Base Amplitude Panning) [3]). We also test the robustness of a adapted conjoint localization method under noisy and reverberant conditions; this method uses spectra of recorded binaural signals, and tries to minimize the distance between the ILD and ITD based azimuth estimates. We show comparative results with the PHAT generalized cross-correlation localization method [4].
Download MOSPALOSEP: A Platform for the Binaural Localization and Separation of Spatial Sounds using Models of Interaural Cues and Mixture Models
In this paper, we present the MOSPALOSEP platform for the localization and separation of binaural signals. Our methods use short-time spectra of the recorded binaural signals. Based on a parametric model of the binaural mix, we exploit the joint evaluation of interaural cues to derive the location of each time-frequency bin. Then we describe different approaches to establish localization: some based on an energy-weighted histogram in azimuth space, and others based on an unsupervised number of sources identification of Gaussian mixture model combined with the Minimum Description Length. In this way, we use the revealed Gaussian Mixture Model structure to identify the particular region dominated by each source in a multi-source mix. A bank of spatial masks allows the extraction of each source according to the posterior probability or to the Maximum Likelihood binary masks. An important condition is the Windowed-Disjoint Orthogonality of the sources in the time-frequency domain. We assess the source separation algorithms specifically on instruments mix, where this fundamental condition is not satisfied.